skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Chen, Z"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available September 26, 2026
  2. Free, publicly-accessible full text available September 26, 2026
  3. Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs’ ability to manipulate, program, and reason about structured data, especially graphs. We introduce GraphEval36K1 , the first comprehensive graph dataset, comprising 40 graph coding problems and 36,900 test cases to evaluate the ability of LLMs on graph problem solving. Our dataset is categorized into eight primary and four sub-categories to ensure a thorough evaluation across different types of graphs. We benchmark ten LLMs, finding that private models outperform open-source ones, though the gap is narrowing. We also analyze the performance of LLMs across directed vs undirected graphs, different kinds of graph concepts, and network models. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on complex graph tasks. Results show that SSD improves the average passing rate of GPT-4, GPT4o, Gemini-Pro and Claude-3-Sonnet by 8.38%, 6.78%, 29.28% and 25.28%, respectively. 
    more » « less
    Free, publicly-accessible full text available April 29, 2026
  4. Vorobeychik, Y; Das, S; Nowé, A (Ed.)
    Free, publicly-accessible full text available June 5, 2026
  5. Free, publicly-accessible full text available December 15, 2025
  6. Free, publicly-accessible full text available December 15, 2025
  7. Free, publicly-accessible full text available December 10, 2025
  8. Free, publicly-accessible full text available December 10, 2025
  9. Free, publicly-accessible full text available December 9, 2025
  10. Large Language Models (LLMs) have achieved remarkable success in natural language tasks, yet understanding their reasoning processes re- mains a significant challenge. We address this by introducing XplainLLM, a dataset accom- panying an explanation framework designed to enhance LLM transparency and reliability. Our dataset comprises 24,204 instances where each instance interprets the LLM’s reasoning behavior using knowledge graphs (KGs) and graph attention networks (GAT), and includes explanations of LLMs such as the decoder- only Llama-3 and the encoder-only RoBERTa. XplainLLM also features a framework for gener- ating grounded explanations and the debugger- scores for multidimensional quality analysis. Our explanations include why-choose and why- not-choose components, reason-elements, and debugger-scores that collectively illuminate the LLM’s reasoning behavior. Our evaluations demonstrate XplainLLM’s potential to reduce hallucinations and improve grounded explana- tion generation in LLMs. XplainLLM is a re- source for researchers and practitioners to build trust and verify the reliability of LLM outputs. Our code and dataset are publicly available. 
    more » « less
    Free, publicly-accessible full text available November 12, 2025